Individual Assignment #2: ARIMA Lab.

Due: Nov. 23 before class time

(40 points)


The file titled US Electricity.csv includes a time series index compiled by the US Federal Reserve representing total fossil-fuel US electricity generation by all utilities from January 1939 through October 2021.

In the following code box we read the CSV file and set up the data as a tsibble and then we plot it and subset it to examine it.

We are interested in developing a two-year long monthly forecast (24 months) for the national electricity production requirements.

  1. Examine the stationarity of the ELEC time series in the reduced DR data, examine also the corresponding ACF and PACF diagrams and propose three plausible ARIMA models to fit the data.

For this time-series we will try: (after testing we already know 1,0,0 will be auto selected!)

ARIMA(0,1,0) #random walk with drift

ARIMA(0,0,1) #moving average

ARIMA(3,1,0)

  1. Using fable fit the following five models to the DR data: (i)-(iii) the three models you propose in (1), (iv) the automatically selected model by the ARIMA() function, and (v) the automatically selected model by the ETS() function. Report the name/order of each model and the corresponding AICc and BIC.
## Series: ELEC 
## Model: ARIMA(0,1,0)(2,1,0)[12] 
## 
## Coefficients:
##          sar1     sar2
##       -0.3561  -0.3160
## s.e.   0.0906   0.0893
## 
## sigma^2 estimated as 15.61:  log likelihood=-358.21
## AIC=722.42   AICc=722.61   BIC=730.97
## Series: ELEC 
## Model: ARIMA(0,0,1)(2,1,1)[12] 
## 
## Coefficients:
##          ma1    sar1     sar2     sma1
##       0.3538  0.1687  -0.3520  -0.8514
## s.e.  0.0758  0.1306   0.1126   0.2248
## 
## sigma^2 estimated as 8.5:  log likelihood=-329.61
## AIC=669.22   AICc=669.71   BIC=683.52
## Series: ELEC 
## Model: ARIMA(3,1,0)(2,1,0)[12] 
## 
## Coefficients:
##           ar1      ar2      ar3     sar1     sar2
##       -0.4892  -0.3676  -0.1726  -0.3371  -0.4363
## s.e.   0.0901   0.0982   0.0902   0.0869   0.0874
## 
## sigma^2 estimated as 12.52:  log likelihood=-343.88
## AIC=699.75   AICc=700.44   BIC=716.86
## Series: ELEC 
## Model: ARIMA(1,0,0)(2,1,0)[12] 
## 
## Coefficients:
##          ar1     sar1     sar2
##       0.3757  -0.3375  -0.4386
## s.e.  0.0877   0.0834   0.0870
## 
## sigma^2 estimated as 10.76:  log likelihood=-337.76
## AIC=683.53   AICc=683.85   BIC=694.97
## Series: ELEC 
## Model: ETS(M,N,A) 
##   Smoothing parameters:
##     alpha = 0.2924178 
##     gamma = 0.0001000089 
## 
##   Initial states:
##      l[0]     s[0]     s[-1]     s[-2]     s[-3]    s[-4]    s[-5]      s[-6]
##  101.0318 7.817185 -6.612555 -11.00125 -2.759789 8.873747 10.47588 -0.3751617
##      s[-7]     s[-8]     s[-9]   s[-10]   s[-11]
##  -11.78396 -13.88839 -2.460821 6.012828 15.70228
## 
##   sigma^2:  9e-04
## 
##      AIC     AICc      BIC 
## 1019.152 1022.992 1063.383
  1. Examine the residuals of all the models using the Ljung-Box test and the gg_tsresiduals() function. Is there a validity problem with any of the models?

It appears that for models 2 and 3, some of the errors are not just white noise. There are correlations that are more significant than they should be. For those models, there seems to be a slight lack of fit.

  1. For the set of five models selected (automatically and/or manually) examine the in-sample accuracy metrics. Based on a holistic analysis of the information criteria select the best two ARIMA models and the ETS model. Report the model name/order and their parameter values.

The best two ARIMA models are: m2:(0,0,1)(2,1,1)[12] AND m4:(1,0,0)(2,1,0)[12]

The ETS model: ETS(M,N,A)Smoothing parameters: alpha = 0.2924178, gamma = 0.0001000089

For model cross-validation purposes stretch the DR data as follows:
  1. Fit cross-validation models for each of the time sub-series in the stretched data for each of the four model types selected in (4). In the case(s) where the models were automatically selected, do NOT run the automatic selection under cross validation, instead enter manually the model order/type when you call the ARIMA()/ETS() function.

  2. Prepare a 24-month ahead forecast for each of the models fitted in (5) and prepare a plot of MAPE vs months-ahead. Based on the dynamic behavior of cross-validation MAPE discuss which model(s) should be kept/discarded.

Based on the MAPE vs. h(months-ahead) plot, we will discard the ETS model because we can do much better with the ARIMA models.

  1. Examine the cross-validation residuals of the models you selected in (6), and based on their correlation (model vs. model) discuss if it is advisable to prepare an ensemble forecast averaging the forecasts of two or more models.

It may be worth creating an ensemble forecast because the combo will likely produce a result that is more robust and consistent! The plots done in (6) show that the two top ARIMA models could help each other by filling in each others weaknesses or inconsistencies.

  1. The index is very useful for energy planning purpose as most of the variability and seasonality is produced by combined cycle natural gas plants and single cycle peaker plants that also run on natural gas (i.e., nuclear and coal generation is fixed and relatively constant). For this purpose it is of interest to know what is the production index level that will not be superated with a probability (service-level) of 95%.

For the best model in (6) plot the 24-month ahead forecast and plot the forecast and the corresponding confidence interval to help you address the service level question. Report numerically the month-by-month the index forecasts that meet the desired 95% service level.